Search CORE

24 research outputs found

Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime

Author: Prokopec Aleksandar
Publication venue: Lausanne, EPFL
Publication date: 25/08/2014
Field of study

Infoscience - École polytechnique fédérale de Lausanne

On a Near Optimal Work-Stealing Tree Data-Parallel Scheduler for Highly Irregular Workloads

Author: Odersky Martin
Prokopec Aleksandar
Publication venue
Publication date: 01/10/2013
Field of study

We present a work-stealing algorithm for runtime scheduling of dataparallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, which allows workers to decide on the work division in a lock-free, workloaddriven manner and attempts to minimize the amount of communication between them. A signiﬁcant effort is given to showing that the algorithm has the least possible amount of overhead. We provide an extensive experimental evaluation, comparing the advantages and shortcomings of different data-parallel schedulers in order to combine their strengths. We show speciﬁc workload distribution patterns appearing in practice for which different schedulers yield suboptimal speedup, explaining their drawbacks and demonstrating how the work-stealing tree scheduler overcomes them. We thus justify our design decisions experimentally, but also provide a theoretical background for our claims

Infoscience - École polytechnique fédérale de Lausanne

Achieving Efficient Work-Stealing for Data-Parallel Collections

Author: Odersky Martin
Prokopec Aleksandar
Publication venue
Publication date: 19/04/2013
Field of study

In modern programming high-level data-structures are an important foundation for most applications. With the rise of the multi-core era, there is a growing trend of supporting data-parallel collection operations in general purpose programming languages and platforms. To facilitate object-oriented reuse these operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when used in problems with irregular workloads. Work-stealing is a proven load-balancing technique when it comes to irregular workloads, but general purpose work-stealing also suffers from abstraction penalties. In this paper we present a generic design of a data-parallel collections framework based on work-stealing for shared-memory architectures. We show how abstraction penalties can be overcome through callsite specialization of data-parallel operations instances. Moreover, we show how to make work-stealing fine-grained and efficient when specialized for particular data-structures. We experimentally validate the performance of different data-structures and data-parallel operations, achieving up to 60X better performance with abstraction penalties eliminated and 3X higher speedups by specializing work-stealing compared to existing approaches

Infoscience - École polytechnique fédérale de Lausanne

Isolates, channels, and event streams for composable distributed programming

Author: Odersky Martin
Prokopec Aleksandar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/04/2018
Field of study

The actor model has been a model of choice for building reliable distributed systems. On one hand, it ensures that message-processing is serialized within each actor, preserving the familiar sequential programming model. On the other hand, programs written in the actor model are location-transparent. The model is sufficiently low-level to express arbitrary message protocols. Composing these protocols is the key to high-level abstractions. Unfortunately, it is difficult to reuse or compose message protocols with actors. Reactive isolates, proposed in this paper, simplify protocol composition with first-class typed channels and event streams. We compare reactive isolates and the actor model on concrete programs. We identify obstacles for composition in the classic actor model, and show how to overcome them. We then show how to build reusable, composable distributed computing components in the new model

Infoscience - École polytechnique fédérale de Lausanne

Duet Benchmarking: Improving Measurement Accuracy in the Cloud

Author: Bulej Lubomír
Farquet François
Horký Vojtěch
Prokopec Aleksandar
Tůma Petr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/01/2020
Field of study

We investigate the duet measurement procedure, which helps improve the accuracy of performance comparison experiments conducted on shared machines by executing the measured artifacts in parallel and evaluating their relative performance together, rather than individually. Specifically, we analyze the behavior of the procedure in multiple cloud environments and use experimental evidence to answer multiple research questions concerning the assumption underlying the procedure. We demonstrate improvements in accuracy ranging from 2.3x to 12.5x (5.03x on average) for the tested ScalaBench (and DaCapo) workloads, and from 23.8x to 82.4x (37.4x on average) for the SPEC CPU 2017 workloads

arXiv.org e-Print Archive

Crossref

Containers and Aggregates, Mutators and Isolates for Reactive Programming

Author: Haller Philipp
Odersky Martin
Prokopec Aleksandar
Publication venue
Publication date: 01/01/2014
Field of study

Many programs have an inherently reactive nature imposed by the functional dependencies between their data and external events. Classically, these dependencies are dealt with using callbacks. Reactive programming with first-class reactive values is a paradigm that aims to encode callback logic in declarative statements. Reactive values concisely define dependencies between singular data elements, but cannot efficiently express dependencies in larger datasets. Orthogonally, embedding reactive values in a shared-memory concurrency model convolutes their semantics and requires synchronization. This paper presents a generic framework for reactive programming that extends first-class reactive values with the concept of lazy reactive containers, backed by several concrete implementations. Our framework addresses concurrency by introducing reactive isolates. We show examples that our programming model is efficient and convenient to use

Infoscience - École polytechnique fédérale de Lausanne

Crossref

On Lock-Free Work-stealing Iterators for Parallel Data Structures

Author: Odersky Martin
Petrashko Dmitry
Prokopec Aleksandar
Publication venue
Publication date: 13/02/2014
Field of study

With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular workloads. Work-stealing is a proven technique for load balancing irregular workloads, but general purpose work-stealing also suffers abstraction penalties. We present a generic data-parallel collections design based on work-stealing for shared-memory architectures that overcomes abstraction penalties through callsite specialization of data-parallel operation instances. Moreover, we introduce \textit{work-stealing iterators} that allow fine-grained and efficient work-stealing for particular data-structures. By eliminating abstraction penalties and making work-stealing data-structure-aware we achieve up to 60x better performance compared to JVM-based approaches and 3x speedups compared to tools such as Intel TBB

Infoscience - École polytechnique fédérale de Lausanne

Cache-Aware Lock-Free Concurrent Hash Tries

Author: Bagwell Phil
Odersky Martin
Prokopec Aleksandar
Publication venue
Publication date: 14/06/2011
Field of study

This report describes an implementation of a non-blocking concurrent shared-memory hash trie based on single-word compare-and-swap instructions. Insert, lookup and remove operations modifying different parts of the hash trie can be run independent of each other and do not contend. Remove operations ensure that the unneeded memory is freed and that the trie is kept compact. A pseudocode for these operations is presented and a proof of correctness is given -- we show that the implementation is linearizable and lock-free. Finally, benchmarks are presented which compare concurrent hash trie operations against the corresponding operations on other concurrent data structures, showing their performance and scalability

Infoscience - École polytechnique fédérale de Lausanne